Combining Granularity-based Topic-Dependent and Topic-Independent Evidences for Opinion Detection
نویسنده
چکیده
Opinion mining is a subdiscipline within Information Retrieval (IR) and Computational Linguistics. It refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online sources like news articles, social media comments, and other user-generated content. It is also known by many other terms like opinion nding, opinion detection, sentiment analysis, sentiment classi cation, polarity detection, etc. De ning in more speci c and simpler context, opinion mining is the task of retrieving opinions on an issue as expressed by the user in the form of a query. There are many problems and challenges associated with the eld of opinion mining. In this thesis, we focus on some major problems of opinion mining. One of the foremost and major challenges of opinion mining is to nd opinions specifically relevant to the given topic (query). A document can contain information about many topics at a time and it is possible that it contains opinionated text about each of the topic being discussed or about only few of them. Therefore, it becomes very important to choose topic-relevant document segments with their corresponding opinions. We approach this problem on two granularity levels, sentences and passages. In our rst approach for sentence-level, we use semantic relations of WordNet to nd this opinion-topic association. In our second approach for passage-level, we use more robust IR model (i.e., language model) to focus on this problem. Basic idea behind both contributions for opinion-topic association is that if a document contains more opinionated topic-relevant textual segments (i.e., sentences or passages) then it is more opinionated than a document with less opinionated topic-relevant textual segments. Most of the machine-learning based approaches for opinion mining are domain-dependent (i.e., their performance vary from domain to domain). On the other hand, a domain or topic-independent approach is more generalized and can sustain its e ectiveness across di erent domains. However, topic-independent approaches su er from poor performance generally. It is a big challenge in the eld of opinion mining to develop an approach which is both e ective and generalized at the same time. Our contribu-
منابع مشابه
WIDIT in TREC 2007 Blog Track: Combining Lexicon-Based Methods to Detect Opinionated Blogs
In TREC-2007, Indiana University‟s WIDIT Lab 1 participated in the Blog track‟s opinion task and the polarity subtask. For the opinion task, whose goal is to "uncover the public sentiment towards a given entity/target", we focused on combining multiple sources of evidence to detect opinionated blog postings. Since detecting opinionated blogs on a given topic (i.e., entity/target) involves not o...
متن کاملWIDIT in TREC 2008 Blog Track: Leveraging Multiple Sources of Opinion Evidence
Indiana University‟s WIDIT Lab 1 participated in the Blog track‟s opinion task and the polarity subtask, where we combined multiple opinion detection methods to leverage a variety of complementary evidences rather than trying to optimize the utilization of a single source of evidence. To address the weakness of our past topical retrieval strategy, which generated mediocre baseline results with ...
متن کاملTraffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملHot Topic Detection in News Blogs from the Perspective of W2T
News blog hot topics are important for the information recommendation service and marketing. However, information overload and personalized management make the information arrangement more difficult. Moreover, what influences the formation and development of blog hot topics is seldom paid attention to. In order to correctly detect news blog hot topics, the paper first analyzes the development o...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کامل